Skip to content

DAOS-18859: Test Fix#18321

Draft
knard38 wants to merge 2 commits into
release/2.8from
ckochhof/fix/release/2.8/daos-18859/patch-001
Draft

DAOS-18859: Test Fix#18321
knard38 wants to merge 2 commits into
release/2.8from
ckochhof/fix/release/2.8/daos-18859/patch-001

Conversation

@knard38
Copy link
Copy Markdown
Contributor

@knard38 knard38 commented May 21, 2026

Steps for the author:

  • Commit message follows the guidelines.
  • Appropriate Features or Test-tag pragmas were used.
  • Appropriate Functional Test Stages were run.
  • At least two positive code reviews including at least one code owner from each category referenced in the PR.
  • Testing is complete. If necessary, forced-landing label added and a reason added in a comment.

After all prior steps are complete:

  • Gatekeeper requested (daos-gatekeeper added as a reviewer).

kanard38 added 2 commits May 12, 2026 13:18
After commit 8f3ac4a switched daos_eq_poll from DAOS_EQ_WAIT to
DAOS_EQ_NOWAIT, two bugs were introduced in kv_put and kv_get:

1. Stale evp dereference on poll failure: when the inner spin loop
   exits with rc < 0 (poll error), evp is NOT updated by daos_eq_poll.
   The code then falls through to access evp->ev_error and call
   daos_kv_put/daos_kv_get with the stale pointer, which may point to
   an event still in-flight. This corrupts DAOS internal state and
   causes a SIGSEGV inside libdaos.so.

   Fix: add an explicit 'if (rc < 0) break;' guard after the inner
   spin loop, mirroring the original DAOS_EQ_WAIT code that had
   'if (rc < 0) break;' as the first check after polling.

2. Missing ev_error check in kv_put drain loop: the new NOWAIT-based
   drain loop stopped checking evp->ev_error for each drained event,
   silently ignoring I/O errors that occurred on in-flight requests.
   The original DAOS_EQ_WAIT loop checked 'rc = evp->ev_error' on
   every completion.

   Fix: restore the ev_error check in the drain loop.

Signed-off-by: Cedric Koch-Hofer <cedric.koch-hofer@hpe.com>
Add D_ERROR + fprintf(stderr) diagnostic messages in kv_put() and kv_get()
that fire when daos_eq_poll() returns rc < 0 (poll error) and the fix
prevents the stale evp dereference.

With the fix in place, the messages confirm the condition was caught and
handled safely — the code breaks out without dereferencing the stale
pointer, and the error propagates cleanly.

Quick-Functional: true
Test-repeat: 5
Test-tag: PoolAutotestTest,test_pool_autotest
Signed-off-by: Cedric Koch-Hofer <cedric.koch-hofer@hpe.com>
@github-actions
Copy link
Copy Markdown

Errors are component not formatted correctly,Ticket number suffix is not a number. See https://daosio.atlassian.net/wiki/spaces/DC/pages/11133911069/Commit+Comments,Unable to load ticket data
https://daosio.atlassian.net/browse/DAOS-18859:

@daosbuild3
Copy link
Copy Markdown
Collaborator

Test stage Functional Hardware Medium MD on SSD completed with status FAILURE. https://jenkins-3.daos.hpc.amslabs.hpecorp.net//job/daos-stack/job/daos/view/change-requests/job/PR-18321/1/execution/node/740/log

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Development

Successfully merging this pull request may close these issues.

3 participants